Assignment for ggplot2

For this optional assignment we will be recreating this plot from The Economist:



Feel free to work throughas little or as much as you want. We haven't covered everything that you will need to reproduce this plot, but we have covered the skills needed for you to find and learn what you need from the documentation. Best of luck and have fun!

What to do

  • This assignment will be very challenging! You will recreate this plot by following the steps outlined in bold below. You will need to reference documentation! There are things in this plot that we have purposefully not covered to test your skills in going to the documentation and referencing what you need to know. Links and hints will be provided to along the way!

Let's get started!

Import the ggplot2 data.table libraries and use fread to load the csv file 'Economist_Assignment_Data.csv' into a dataframe called df (Hint: use drop=1 to skip the first column)

In [3]:
# Code

Check the head of df

In [4]:
# Code
Out[4]:
CountryHDI.RankHDICPIRegion
1Afghanistan1720.3981.5Asia Pacific
2Albania700.7393.1East EU Cemt Asia
3Algeria960.6982.9MENA
4Angola1480.4862SSA
5Argentina450.7973Americas
6Armenia860.7162.6East EU Cemt Asia

Use ggplot() + geom_point() to create a scatter plot object called pl. You will need to specify x=CPI and y=HDI and color=Region as aesthetics

In [5]:
# Code
In [6]:
pl

Change the points to be larger empty circles. (You'll have to go back and add arguments to geom_point() and reassign it to pl.) You'll need to figure out what shape= and size=

In [7]:
# Code
pl

Add geom_smooth(aes(group=1)) to add a trend line

In [8]:
# Code
geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

We want to further edit this trend line. Add the following arguments to geom_smooth (outside of aes):

  • method = 'lm'
  • formula = y ~ log(x)
  • se = FALSE
  • color = 'red'

For more info on these arguments, check out the documentation under the Arguments list for details.

Assign all of this to pl2

In [9]:
# Code
In [10]:
pl2

It's really starting to look similar! But we still need to add labels, we can use geom_text! Add geom_text(aes(label=Country)) to pl2 and see what happens. (Hint: It should be way too many labels)

In [45]:
# Code

Labeling a subset is actually pretty tricky! So we're just going to give you the answer since it would require manually selecting the subset of countries we want to label!

In [12]:
pointsToLabel <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
                   "Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
                   "India", "Italy", "China", "South Africa", "Spane",
                   "Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
                   "United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
                   "New Zealand", "Singapore")

pl3 <- pl2 + geom_text(aes(label = Country), color = "gray20", 
                data = subset(df, Country %in% pointsToLabel),check_overlap = TRUE)

pl3

Almost there! Still not perfect, but good enough for this assignment. Later on we'll see why interactive plots are better for labeling. Now let's just add some labels and a theme, set the x and y scales and we're done!

Add theme_bw() to your plot and save this to pl4

In [29]:
# Code

Add scale_x_continuous() and set the following arguments:

  • name = Same x axis as the Economist Plot
  • limits = Pass a vector of appropriate x limits
  • breaks = 1:10
In [38]:
# Code

Now use scale_y_continuous to do similar operations to the y axis!

In [43]:
# Code

Finally use ggtitle() to add a string as a title.

In [44]:
# Code

Great Job!

That's it you're done!!